Sea Level Monitor analysis

The Sea Level Monitor methodology is described in detail in Deltares (2023). Principles are:

previous_df <- readSeaLevelData(file.path("../../data/deltares/results/dutch-sea-level-monitor-export-stations-2023-11-20.csv")) %>%
  filter(station %in% params$station)

Get data from PSMSL

Annual average sea level data for the Dutch main stations is downloaded from the Permanent Service for Mean Sea Level site and combined with the Global Tide and Surge Model (GTSM) annual average surge values. In case PSMSL data is not available for the most recent year (2024-1)

# Get data from PSMSL data service
rlr_df <- read_yearly_psmsl_csv(mainstations_df$psmsl_id, filepath = "../../") 

In this analysis, measurements over the period 1890 to 2023 are considered.

if(params$monitoryear-1 == max(rlr_df$year)){
  cat("Mean annual sea level downloaded from PSMSL are availabale up to ", max(rlr_df$year), ", the time series is up to date. ")
} else {
  cat("Mean annual sea level downloaded from PSMSL are only available up to ", max(rlr_df$year), " and thus incomplete for the current analysis. In order to do a preliminary analysis, measurements from Rijkswatersataat Data Distribution Layer will be used for missing year(s). ")
}

Mean annual sea level downloaded from PSMSL are availabale up to 2023 , the time series is up to date.

# Check if PSMSL is up to date for analysis year

if(max(rlr_df$year) == params$monitoryear-1){
  refreshed_df <- rlr_df
  print("PSMSL time series is up-to-date and is used for analysis")
} else {
  print("The PSMSL time series is not complete. An attempt is made to complete the data using RWS DDL.")
  if(max(rlr_df$year) == params$monitoryear-2){ # robuuster maken. nu alleen check voor een ontbrekend jaar.
    # Check if ddl data from required year exist
    required_file <- file.path("../../data/rijkswaterstaat/ddl/annual_means/", paste0(params$monitoryear-1, ".csv"))
    
    if(file.exists(required_file)){
      ddl_datayear <- read_csv2(required_file)
      
    } else{
      cat("DDL data for year", params$monitoryear-1, "is not available", sep = " ")
    }
  }
}
FALSE [1] "PSMSL time series is up-to-date and is used for analysis"
# Get GTSM data from local file
gtsm <- read_yearly_gtsm(filename = "../../data/deltares/gtsm/gtsm_surge_annual_mean_main_stations.csv") |>
  mutate(year = year(ymd(t)))
# convert rlr tot nap2005
    refreshed_df <- rlr_df |> 
      mutate(
        height = rlr_height_mm - as.numeric(`nap-rlr`),
      ) |> 
      rename(station = name)

# check if psmsl data need to be completed with ddl data
    try(
      if(
        exists("ddl_datayear") & 
        !unique(ddl_datayear$year) %in% unique(refreshed_df$year)
      ){
        refreshed_df <- refreshed_df |> bind_rows(ddl_datayear)
      },
      silent = T
    )

    refreshed_df <- refreshed_df |>
      left_join(gtsm, by = c(station = "name", year = "year")) |>
      mutate(
        surge_anomaly = case_when(
          year >= 1950 ~ (1000 * surge - mean(1000 * surge, na.rm = T)), # meters to millimeters
          year < 1950 ~ 0
        )
      ) |>
      select(
        year,
        height,
        station,
        surge_anomaly
      ) %>%
      bind_rows(
        . |>
          group_by(year) |>
          summarise(
            height = mean(height, na.rm = T),
            surge_anomaly = mean(surge_anomaly, na.rm = T)
          ) |>
          mutate(
            station = "Netherlands"
          )
      ) %>%
      bind_rows(
        . |>
          filter(station %in% c("Vlissingen", "Hoek van Holland", "Den Helder", "Harlingen", "IJmuiden")) |>
          group_by(year) |>
          summarise(
            height = mean(height, na.rm = T),
            surge_anomaly = mean(surge_anomaly, na.rm = T)
          ) |>
          mutate(
            station = "Netherlands (without Delfzijl)"
          )
      ) |>
      addBreakPoints() %T>% 
      write_csv2("../../data/deltares/results/dutch-sea-level-monitor-export-stations-latest.csv") %>%
      filter(year >= 1890)

Compare with previous years analysis data

# previous year did not include gtsm for year 1950.
# Therefore comparison of surge is done for years > 1950

# range(df$surge_anomaly)
# range(refreshed_df$surge_anomaly, na.rm = T)

refreshed_df_filter <- refreshed_df %>% filter(year > 1950 & year < 2023)

ggplot() +
  geom_point(
    data = previous_df, 
    aes(x = year, y = height),
    shape = "+"
  ) +
  geom_point(
    data = refreshed_df_filter, 
    aes(x = year, y = height), 
    color = "blue", 
    shape = 21, 
    fill = "transparent",
    size = 1
  ) +
  facet_wrap(c("station"), ncol = 3)

ggplot() +
  geom_point(
    data = previous_df, 
    aes(x = year, y = surge_anomaly)
    ) +
  geom_point(
    data = refreshed_df_filter, 
    aes(x = year, y = surge_anomaly), 
    color = "blue", 
    shape = 21, 
    fill = "transparent",
    size = 1
    ) +
  facet_wrap(c("station"), ncol = 3)

Locations of the main stations

This document analyses the sea level trend of the main stations in the Netherlands using different models. Based on geographical coverage and available time series length six stations are considered to be “main tide gauge stations”.

Additionally, to calculate the average sea level and sea level trend along the Dutch Coast, the main station sea level is averaged for each year (virtual station “Netherlands”). Because the station “Delfzijl” has a considerable gap in vertical adjustment for subsidence, we also consider a variant in which Delfzijl is omitted from the main analysis selection (“Netherlands (without Delfzijl)”).

map_stations(df = refreshed_df, mainstations_df = mainstations_df, mainstations_locs = mainstations_locs)

Hoofdgetijstations in Nederland. Er is aangegeven welke stations zijn meegenomen in dit rekendocument.

Sea level measurements

In this section we look at sea level measurements. The global collection of tide gauge records at the PSMSL was used to access the data. There are two types of datasets the “Revised Local Reference” and “Metric”. For the Netherlands the difference is that the “Revised Local Reference” undoes the corrections from the NAP correction in 2005, to get a consistent dataset. Here we transform the RLR back to NAP (without undoing the correction).

The rlrnap computes the rlr back to latest NAP (ignoring the undoing of the NAP correction) the alpha paramater is the dominant wind direction for the stations, based on de Ronde 2013. Id’s are the station ids in the PSMSL dataset. They may change from year to year as the PSMSL 0 point is arbitary. You can lookup the relevant parameters in the schematic diagram like this LRL diagram for station Vlissingen

knitr::kable(mainstations_df[,c('name', 'psmsl_id', 'msl-rlr', 'msl-nap', 'nap-rlr')], caption = "PSMSL inforamtion on the six mains tide gauge stations in the Netherlands.")
PSMSL inforamtion on the six mains tide gauge stations in the Netherlands.
name psmsl_id msl-rlr msl-nap nap-rlr
Vlissingen 20 6976 46 6930
Hoek van Holland 22 6987 114 6873
Den Helder 23 6962 16 6946
Delfzijl 24 6953 130 6823
Harlingen 25 7024 110 6914
IJmuiden 32 7014 64 6950

Sea level measurements for the six main stations (yearly average) are shown in figure @ref(fig:zeespiegelmetingen).

p <- refreshed_df %>%
  dplyr::filter(!grepl("Netherlands", station)) %>%
ggplot(aes(year, height)) +
  geom_point(alpha = 1, aes(color = station), shape = 21, fill = "white", size = 1) +
  geom_line(alpha = 0.5, aes(color = station), linewidth = 0.5) +
  xlab("jaar") + ylab("gemeten zeespiegel in mm") +
  theme_light() +
  theme(legend.position = "bottom")

ggplotly(p) %>% layout(legend = list(x = 0.05, y = 0.95))

Jaarlijks gemiddelde zeespiegel voor de zes hoofdstations langs de Nederlandse kust.

# p
p <- refreshed_df %>%
  dplyr::filter(grepl("Netherlands", station)) %>%
ggplot(aes(year, height)) +
  geom_point(alpha = 1, aes(color = station), shape = 21, fill = "white", size = 1) +
  geom_line(alpha = 0.5, aes(color = station), linewidth = 0.75) +
  xlab("jaar") + ylab("gemeten zeespiegel in mm") +
  theme_light() +
  theme(legend.position = "bottom")

ggplotly(p) %>% layout(legend = list(x = 0.05, y = 0.95))

Jaarlijks gemiddelde zeespiegel voor gemiddelde van stations langs de Nederlandse kust.

# p

Sea level high years

refreshed_df %>%
  filter(year >= 1900) %>%
  dplyr::filter(station == "Netherlands (without Delfzijl)") %>%
  dplyr::arrange(-height) %>%
  dplyr::select(year, station, height_mm = height) %>%
  # dplyr::group_by(station) %>%
  dplyr::slice(c(1:5)) %>%
  knitr::kable(caption = "Overview of the five highest yearly average water levels for the combined station Netherlands (without Delfzijl) in mm during the considered period. ")
Overview of the five highest yearly average water levels for the combined station Netherlands (without Delfzijl) in mm during the considered period.
year station height_mm
2023 Netherlands (without Delfzijl) 152.8
2020 Netherlands (without Delfzijl) 96.6
2022 Netherlands (without Delfzijl) 94.6
2017 Netherlands (without Delfzijl) 94.2
2019 Netherlands (without Delfzijl) 92.0

Storm surge

The expected storm surge per year is determined using output of the Global Tide and Surge Model (GTSM) (ref). This calculates the surge given the bathymetry and climatic conditions. Model results are available from 1950 onwards (figure @ref(fig:gtsm-surge)). The model is run each successive year to calculate the annual average wind and air pressure for use in the Sea Level Monitor. The calculated variation in surge (surge anomaly) is subtracted from the measured sea level before the sea level rise is calculated. For the years before 1950, no runs are available and sea level is corrected for average surge only. This correction reduces the variation due to differences in surge per year and allows a more precise estimate of the long-term trend to be made.

p <- refreshed_df %>%
  # filter(!grepl("Netherlands", station)) %>%
  # filter(station == "IJmuiden") %>%
ggplot(aes(year, surge_anomaly)) +
  geom_point(alpha = 1, aes(color = station), shape = 21, fill = "white", size = 1) +
  geom_line(alpha = 0.5, aes(color = station), linewidth = 0.75) +
  xlab("jaar") + ylab("windopzet in mm") +
  theme_light() +
  theme(legend.position = "bottom") +
  coord_cartesian(xlim = c(1945, params$monitoryear))

# ggplotly(p) %>% layout(legend = list(x = 0.05, y = 0.95))
p
Modelled surge anomaly (deviation of storm surge from long year avearge). The yearly averages surge is calculated for 1950 - now. For earlier years, an average surge is assumed.

Modelled surge anomaly (deviation of storm surge from long year avearge). The yearly averages surge is calculated for 1950 - now. For earlier years, an average surge is assumed.

Trend analysis

The sea level trend is calculated using generalized linear regression. the following models are tested (Deltares 2018 and Deltares 2023):

  • linear model
  • broken linear model (with breakpoint at 1993)
  • broken quadratic model (breakpoint at 1960)

The trend is calculated for all stations individually, for the mean of all stations (“Netherlands”), and for means of all stations minus station Delfzijl (“Netherlands without Delfzijl”).

In the regression equation, nodal tide is one of the components and is estimated as a sinusoid curve with a period of 18.6 years.

byStation <- refreshed_df %>%
  dplyr::group_by(station) %>%
  tidyr::nest() %>%
  dplyr::ungroup()
selectedmodel <- params$modeltype

models <- byStation %>%
  expand_grid(modeltype = selectedmodel) %>%

  mutate(modelfunctionname = paste(modeltype, "model", sep = "_")) %>%
  # add functions for model calculation
  mutate(modelfunctions = map(modelfunctionname, get)) %>%
  # add models based on data and functions
  mutate(model = pmap(
    list(
      data,
      modelfunctions
    ),
    \(.d, .f) .f(.d)
  )) %>%
  mutate(
    glance = map(model, broom::glance),
    rsq    = glance %>% map_dbl("r.squared"),
    adj.rsq = glance %>% map_dbl("adj.r.squared"),
    AIC    = glance %>% map_dbl("AIC"),
    tidy   = map(model, broom::tidy),
    augment = map(model, broom::augment),
    equation = map(model, function(x) equatiomatic::extract_eq(x))
  )
eq <- models %>% 
  distinct(modeltype, equation) %>%
  mutate(equation = paste0("$", equation, "$"))

knitr::kable(eq, escape = F)
modeltype equation
linear \(\operatorname{height} = \alpha + \beta_{1}(\operatorname{year\ -\ epoch}) + \beta_{2}(\operatorname{cos(2\ *\ pi\ *\ (year\ -\ epoch)/(18.613))}) + \beta_{3}(\operatorname{sin(2\ *\ pi\ *\ (year\ -\ epoch)/(18.613))}) + \epsilon\)
broken_linear \(\operatorname{height} = \alpha + \beta_{1}(\operatorname{year\ -\ epoch}) + \beta_{2}(\operatorname{from1993}) + \beta_{3}(\operatorname{cos(2\ *\ pi\ *\ (year\ -\ epoch)/(18.613))}) + \beta_{4}(\operatorname{sin(2\ *\ pi\ *\ (year\ -\ epoch)/(18.613))}) + \epsilon\)
broken_squared \(\operatorname{height} = \alpha + \beta_{1}(\operatorname{year\ -\ epoch}) + \beta_{2}(\operatorname{from1960\_square}) + \beta_{3}(\operatorname{cos(2\ *\ pi\ *\ (year\ -\ epoch)/(18.613))}) + \beta_{4}(\operatorname{sin(2\ *\ pi\ *\ (year\ -\ epoch)/(18.613))}) + \epsilon\)

Autocorrelation

Autocorrelation with previous year(s) can sometimes explain part of the otherwise unexplained variance. Especially when a trend is detected in the data, autocorrelation with relatively short lags (1 or few years) often occurs. In case of autocorrelation, we recalculate standard errors of the estimated parameters accordingly.

models %>%
  mutate(
    ACF = map(augment, function(x) fortify(acf(x$.resid, plot = F)))
  ) %>%
  unnest(ACF) %>%
  mutate(ACF_pass = (ACF >= lower & ACF <= upper)) %>%
  filter(Lag >= 1) %>%
  ggplot(aes(Lag, ACF)) +
  geom_col(width = 0.4, aes(fill = ACF_pass)) +
  geom_vline(xintercept = 8.9, linetype = 3) +
  geom_vline(xintercept = 18.6, linetype = 3) +
  geom_line(aes(y = lower), linetype = "dotdash", linewidth = 0.5) +
  geom_line(aes(y = upper), linetype = "dotdash", linewidth = 0.5) +
  scale_x_continuous(
    breaks = scales::breaks_pretty(10)
    )+
  facet_grid(station ~ modeltype) +
  theme_minimal() +
  theme(
    strip.text.y = element_text(angle = 0),
    legend.position = "bottom"
        )
Autocorrelation plot for selected stations and models.

Autocorrelation plot for selected stations and models.

There appears to be a consistent autocorrelation for all stations and models with a ‘lag’ of one year. The autocorrelation does not influence the value of the calculated trend parameters, but needs to be taken into account when calculating standard errors. The Newey West autocorrelatie term is used to correctly calculate the standard errors.

At station ‘Vlissingen’ there is autocorrelation with a ‘lag’ of 10 years. This could indicate an effect of the 8.8 year ‘lunar perigee cycle’. Because this only occurs at Vlissingen, this tidal component is not further accounted for in the analysis.

There is no apparent autocorrelation with a ‘lag’ of 18.6 years, the ‘nodal tide’ cycle. This is because the nodal tide is already incorporated in the three models.

# Using NeweyWest():
require(sandwich)

models <- models %>%
  mutate(
    tidy.HAC = map(
      model, 
      function(x) broom::tidy(
        sqrt(
          diag(
            NeweyWest(
              x, 
              lag = 1, 
              prewhite = F, 
              adjust = T
            )
          )
        )
      )
    )
  )
    
  models$tidy.HAC <- lapply(models$tidy.HAC,
       function(x) {
         x %>%
           rename(
             term.HAC = names,
             st.err.HAC = x
           )
       }
)

Heteroskedasticity

Residuals distribution

The distribution of residuals resembles a normal distribution for most stations and model variants. Station Harlingen is an example where the distribution is out of centre when using the linear model. The width of the residuals distribution is narrower when measurements are corrected for surge prior to application of the models.

models %>%
  unnest(c(data, augment), names_sep = "_") %>% 
  mutate(
    surge_correction = case_when(
      params$wind_or_surge_type == "GTSM" ~ ifelse(data_year >= 1950, params$wind_or_surge_type, "none")
    )
  ) %>%
 ggplot(aes(x = augment_.resid)) +
  geom_density(
    aes(
      fill = surge_correction,
      color = surge_correction),
    # position = position_identity(), 
    alpha = 0.5, size = 1
  ) +
  facet_grid(station ~ modeltype) +
  geom_vline(xintercept = 0, alpha = 0.5) +
  theme(
    strip.text.y = element_text(angle = 0), 
    legend.position = "bottom"
  )

Variation of residuals over time

Distribution of residuals should not reveal a clear deviation from a horizontal line.

models %>%
  # filter(!grepl("Netherlands", station)) %>%
  # filter(station == "IJmuiden") %>%
  unnest(c(data, augment), names_sep = "_") %>% #str(max.level = 2)
ggplot(aes(data_year, augment_.resid)) +
  geom_point(alpha = 0.4) +
  facet_grid(station ~ modeltype) #+

Sea level rise

lookup <- c(
   Constant = "(Intercept)",
    Trend = "I(year - epoch)",
    u_nodal = "I(cos(2 * pi * (year - epoch)/(18.613)))",
    v_nodal = "I(sin(2 * pi * (year - epoch)/(18.613)))",
    `+ trend 1993` = "from1993",
    `+ square_trend 1960` = "from1960_square",
    AR_term = "previousYearHeight"
  )

# data wrangling. move to functions.R

all_predictions <- models %>%
  mutate(
    preds = map2(data, model, add_predictions)
  ) %>%
  dplyr::select(
    station,
    modeltype, 
    data, 
    tidy, 
    preds) %>%
  tidyr::unnest(c(data, preds), names_sep = "_") %>% 
  tidyr::unnest(tidy) %>%
    # str(max.level = 2)

  dplyr::select(-std.error, -statistic, -p.value) %>% # clean up
  tidyr::pivot_wider(
    names_from = term, 
    values_from = estimate
  ) %>%
  mutate(`data_height-surge_anomaly` = data_height - `preds_surge_anomaly`) %>%
  mutate(`preds_height-surge_anomaly` = preds_pred - `preds_surge_anomaly`) %>%
  rename(any_of(lookup)) %>%
  # str(max.level = 2)
  mutate(
    nodal_tide = 
      u_nodal * cos(2*pi*(data_year-epoch)/18.613) + 
      v_nodal * sin(2*pi*(data_year-epoch)/18.613),
    prediction_recalc = case_when(
      if("linear" %in% params$modeltype){
        modeltype == "linear" ~ 
          Constant + 
          Trend * (data_year - epoch)# + 
          # AR_term * data_previousYearHeight
      },
      if("broken_linear" %in% params$modeltype){
        modeltype == "broken_linear" ~ 
          Constant + 
          Trend * (data_year - epoch) +
          # AR_term * data_previousYearHeight +
          (data_year >= 1993) * `+ trend 1993` * (data_year - 1993)
      },
      # if("broken_quadratic" %in% params$modeltype){
      #   modeltype == "broken_quadratic" ~ Constant + Trend * (data_year - epoch) +
      #     ifelse(data_year >= 1960, from1960_square * (data_year - 1960) * (data_year - 1960), 0)
      # }
    )
    ) %>%
  select(
    station,
    modeltype,
    data_year,
    data_height,
    preds_year,
    prediction_recalc,
    `data_height-surge_anomaly`,
    `preds_height-surge_anomaly`,
    nodal_tide
  )
ggplot(
  all_predictions,
  aes(x = data_year)
) +
  geom_point(aes(y = data_height)) +
  geom_line(aes(y = prediction_recalc)) +
  facet_grid(station ~ modeltype)
  p <- plot_station( 
    predictions_all = all_predictions,
    stationi = unique(all_predictions$station),
    correctionVariant = "GTSM", 
    modelVariant = unique(all_predictions$modeltype), 
    printNumbers = F, 
    startyear = 1900
  ) +
  facet_grid(station ~ modeltype) +
  theme(legend.direction = "vertical",
        legend.box = "horizontal",
        legend.position = c(0.975, 0.025),
        legend.justification = c(1, 0),
        legend.title = element_blank())

  
  # ggplotly(p) %>% layout(legend = list(x = 0.05, y = 0.95))
  p

Parameters

## gebruik DT in plaats van kableextra

lookup.df <- data.frame(long_term = unname(lookup),
                        short_term = names(lookup))

library(DT)

parametertable <- models %>%
  select(station, modeltype, tidy) %>% 
  unnest(tidy) %>%
  left_join(models %>%
              select(station, modeltype, tidy.HAC) %>%
              unnest(tidy.HAC),
            by = c(
              station = "station",
              modeltype = "modeltype",
              term = "term.HAC"
            )
  ) %>%
  mutate(across(where(is.numeric), round, 3)) %>%
  left_join(lookup.df, by = c(term = "long_term")) %>%
  select(-term, term = short_term) %>%
  relocate(term, .after = modeltype)

parametertable %>%
  DT::datatable(
    options = list(
      "digits" = 3
    )
  )
  # kableExtra::kable(
  #   caption = "Coefficients for all models and stations.",digits = 2
  #   ) %>% 
  # kableExtra::scroll_box(height = "500px")

Which model is the preferred model?

Is there a significant acceleration?

acc_broken_linear <- parametertable %>%
  filter(modeltype == "broken_linear") %>%
  filter(term == "+ trend 1993") %>%
  select(station, p.value )
knitr::kable(acc_broken_linear, caption = "Significance for the acceleration term in the broken linear model for all stations. ")
Significance for the acceleration term in the broken linear model for all stations.
station p.value
Vlissingen 0.132
Hoek van Holland 0.038
Den Helder 0.000
Delfzijl 0.000
Harlingen 0.000
IJmuiden 0.615
Netherlands 0.000
Netherlands (without Delfzijl) 0.000

For the broken linear model, there is a significant acceleration starting in the year 1993 when fitting the average sea level combined for all stations without Delfzijl. For individual stations, the acceleration is not significant for the stations Vlissingen, Hoek van Holland and IJmuiden.

acc_broken_linear <- parametertable %>%
  filter(modeltype == "broken_squared") %>%
  filter(term == "+ square_trend 1960") %>%
  select(station, p.value )
knitr::kable(acc_broken_linear, caption = "Significance for the acceleration term in the broken squared model for all stations. ")
Significance for the acceleration term in the broken squared model for all stations.
station p.value
Vlissingen 0.675
Hoek van Holland 0.012
Den Helder 0.000
Delfzijl 0.000
Harlingen 0.000
IJmuiden 0.811
Netherlands 0.000
Netherlands (without Delfzijl) 0.001

For the broken squared model, there is a significant acceleration starting in the year 1960 when fitting the average sea level combined for all stations without Delfzijl. For individual stations, the acceleration is not significant for the stations Vlissingen, Hoek van Holland and IJmuiden.

Which model has the lowest Akaike Information Criterion (AIC)?

Of the two models with an acceleration term, the model with lowest AIC is the preferred model.

models %>%
  # filter(station == "Netherlands (without Delfzijl)") |>
  mutate(station = as.character(station)) %>%
  select(station, modeltype, AIC) %>%
  # unite(`modeltype x station`, modeltype, station) %>%
  arrange(-AIC) %>% 
  mutate(modeltype = factor(modeltype, levels=config$runparameters$modeltype)) %>%
  mutate(station = factor(station, levels=config$runparameters$station)) %>%
  ggplot(aes(x = modeltype, y = AIC)) +
  geom_point(size = 3, shape = "|") +
  coord_flip() +
  facet_wrap("station")

For the combined stations Netherlands and Netherlands (without Delfzijl), the non linear model has the lowest AIC, and is therefore the first candidate for the preferred model. In the next section, it is tested whether the non-linear model explains the observed variation significantly better than the simplest model, the linear model.

For stations Den Helder and Hoek van Holland, the broken squared model is the model with lowest AIC.

At station IJmuiden, all three models have similar AIC.

Considering all stations, the broken linear and the broken squared model describe the observed variation approximately equally well.

Is the preferred model significantly better than the linear model?

The broken linear model is chosen as the preferred candidate because it gave a better explanation of the observation, corrected for the degrees of freedom of the model (AIC criterion). Here, it is tested whether the broken linear model is significantly better model that the most simple model, the broken linear model.

bl <- models %>% 
  filter(
    station == "Netherlands (without Delfzijl)",
    modeltype == "broken_linear"
  ) %>%
  select(model) %>%
  unlist(recursive = F) %>%
  unname()

l <- models %>% 
  filter(
    station == "Netherlands (without Delfzijl)",
    modeltype == "linear"
  ) %>%
  select(model) %>%
  unlist(recursive = F) %>%
  unname()

t <- anova(l[[1]], bl[[1]])

# broom::tidy(t) %>% 
#   select(term, rss, p.value)

p_value <- t$`Pr(>F)`[2]

if(p_value<0.01) {
  alternativemodelaccepted = TRUE
  alternativemodelrefused = FALSE
}

The Anova table for model comparison is shown below.

# stargazer::stargazer(
#   t, 
#   type = 'html',
#   column.sep.width = '20pt'
#   )

t

The acceleration model (broken linear) has one more degree of freedom as the linear model. The broken linear model is significantly better than the linear model (p < 0.001).

Conclusions

Based on the above analysis, the following conclusions are drawn:

Based on variance analysis the broken linear model is significantly better than the linear model. the broken linear model is accepted as the preferred model.